[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small#21217
[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small#21217WoosukKwon merged 2 commits intomainfrom
Conversation
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Code Review
This pull request deprecates and removes the BlockSparse Attention feature and the Phi3-Small model, which relied on it. The changes are extensive, touching many files across the attention backends, model registry, and testing infrastructure. My review confirms that the removal is clean and consistent. All references to blocksparse_params, the block-sparse attention implementation, and the Phi3SmallForCausalLM model have been correctly eliminated. The related tests and documentation have also been updated accordingly. The changes look good to me.
|
Kernels test failure is related to this PR |
Upstream PR vllm-project/vllm#21217 changed attention APIs. This PR adjusts our attention implementation to the new API. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: x22x22 <wadeking@qq.com>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Paul Pak <paulpak58@gmail.com>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Upstream PR vllm-project/vllm#21217 changed attention APIs. This PR adjusts our attention implementation to the new API. --------- Signed-off-by: Konrad Zawora <kzawora@habana.ai> Signed-off-by: Thomas Atta-fosu <tattafosu@habana.ai>
…roject#21217) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
This PR removes the block sparse attention and the support for phi3-small which uses the attention.